The Multi-Branched Method of Moments for Queueing Networks 



Giuliano Casale 
SAP Research 
TEIC Building, Shore Road 
Newtownabbey, BT37 OQB, UK 
giuliano. casale @ ieee.com 



Abstract 

We propose a new exact solution algorithm for closed 
multiclass product-form queueing networks that is several 
orders of magnitude faster and less memory consuming 
than established methods for multiclass models, such as the 
Mean Value Analysis fMVAj algorithm. The technique is an 
important generalization of the recently proposed Method 
of Moments fMoMj which, differently from MVA, recur- 
sively computes higher- order moments of queue-lengths in- 
stead of mean values. 

The main contribution of this paper is to prove that the 
information used in the MoM recursion can be increased 
by considering multiple recursive branches that evaluate 
models with different number of queues. This reformula- 
tion allows to formulate a simpler matrix difference equa- 
tion which leads to large computational savings with re- 
spect to the original MoM recursion. Computational analy- 
sis shows several cases where the proposed algorithm is be- 
tween 1, 000 and 10, 000 times faster and less memory con- 
suming than the original MoM, thus extending the range of 
multiclass models where exact solutions are feasible. 



1. Introduction 

Product-form queueing networks [2] are popular 
stochastic models used in capacity planning of com- 
puter architectures and networks with the purpose of evalu- 
ating the effect of resource sharing on scalability. In many 
applications, notably modern multi-tier architectures host- 
ing web sites and intranet applications, workloads are best 
described as multiclass, that is, requests are assigned to dif- 
ferent categories according to the statistical characteris- 
tics of their demand at the different servers. Yet, multiclass 
workloads are extremely challenging to analyze in queue- 
ing networks even using state-of-the-art solution techniques 
such as Mean Value Analysis (MVA) [23], the Convo- 
lution Algorithm [6, 22], RECAL [13], LBANC [10], or 



more recent methods based on the generating function ap- 
proach [3, 1 1, 17]. The main problem is that multiclass mod- 
els typically involve at least four or five classes, hundreds or 
thousands of competing requests, and many servers. Yet, es- 
tablished exact solution methods require computational 
costs which are prohibitive for models of this size, e.g., 
memory requirements are usually of the order of many ter- 
abytes or more. As a result, multiclass networks cannot be 
usually solved with exact techniques and the focus is on ap- 
proximation methods [1,9, 14,24], which yet cannot return 
probabilistic measures because they ignore the normaliz- 
ing constant of the Markov chain underlying the queueing 
network. 

Recently, we have proposed the Method of Moments 
(MoM) [7, 8], a new exact technique for multiclass models 
that recursively computes higher-order moments of queue- 
length instead of mean values Uke the MVA approach. The 
MoM approach is based on normalizing constants, thus 
it can also compute probabilistic measures that cannot be 
evaluated by the MVA algorithm. More importantly, the 
higher-order moments approach is much more scalable that 
the MVA approach, since the computational costs increase 
at most log-quadratically with the total population in the 
network, whereas they grow exponentially with the num- 
ber of queues or classes in existing methods such as MVA, 
RECAL, or LBANC. Although much more efficient than 
MVA, the MoM approach becomes infeasible if the number 
of queue and classes grows simultaneously [7], thus models 
with many classes and many queues can be hard to analyze 
even with MoM. In order to address this limitation, we pro- 
pose in this paper a generalization of MoM. The proposed 
approach is always more efficient that the original MoM in 
all cases, yet the largest improvements are achieved on mod- 
els with several queues and many classes which are infeasi- 
ble in the original MoM. 

Our idea consists in integrating the recursive equation 
used in the Convolution Algorithm [6, 22] within the MoM 
approach, which jointly considers in a linear matrix differ- 
ence equation the exact recursive formulas for normalizing 



constants used in RECAL [13] and LBANC [10], but not 
those used in Convolution. By integrating a new formula in 
the MoM matrix difference equation we obtain a new com- 
putational scheme which evaluates higher-order moments 
of queue-lengths on models with different populations and, 
as a result of the generalization, also on models with differ- 
ent number of queues. The main advantage of this approach 
is that the size of the matrix recurrence equation solved at 
each step of the recursion is much smaller that the one used 
in the original MoM approach. This is a fundamental im- 
provement since linear system solution required to solve the 
matrix recurrence grows quadratically or cubically with the 
coefficient matrix order. In particular, we show that even us- 
ing a multi-branched recursion on hundreds or thousands 
of models with different number of queues, the generalized 
MoM is much more efficient than the original MoM which 
does not consider models with different number of queues. 

The remainder of this paper is organized as follows. Af- 
ter giving background in Section |2] we use in Section |3] a 
simple multiclass model to illustrate MoM and the princi- 
ples of the generalization proposed in this paper The anal- 
ysis of the effects of the multi-branched recursion on mod- 
els with different number of queues is derived in Section H) 
where we give in Theorem[T]and Theorem|2]the main theo- 
retical results of this paper. Computational complexity of 
the resulting algorithm is analyzed in Section |5] Finally, 
Section |6] gives conclusions and outlines possible exten- 
sions of this paper. 

2. Background 

We consider a closed product-form queueing network 
with M distinct queues and R service classes. Jobs are 
routed probabilistically through the queues where they re- 
ceive service; after completing service, all jobs re-enter the 
network with a delay of Zj. units of time which depends 
on the request's service class r = 1, . . . , i?. The mean ser- 
vice demand, i.e., the mean service time multiplied by the 
mean number of visits [15], of class-r jobs at queue k is in- 
dicated with Dk.r- The number of jobs of class r is the in- 
teger Nr\ we define N — {Ni, N2, ■ ■ ■ , Nj^) as the popula- 
tion vector of the model and N = Ni + N2 + . . . + Njf is 
the total number of jobs circulating in the network. 

We consider the computation of meanjjerformance in- 
dices such as the mean throughput Xr{N) and the mean 
response time Rr{N) — Nr/Xr{N) of class-r jobs; ad- 
ditionally, for each queue k and class r, we are inter- 
ested in computing the utilization Uk,r{N) — Dk^rXr{N), 
the mean queue-length Qk,r{N), and the mean residence 
times Rk.r{N) — Qk,r{N)/Xr{N). These quantities are 
uniquely determined if one knows how to compute ef- 
ficiently throughput and mean queue-lengths, which are 



given by the following ratios [20] : 



Xr{N) = 



G{m, N - Ir) 
G{m,N) ' 



(1) 



Dk,rG{m + U,N-lr) 

hA^) T^T^^xh ' ^ ' 

G(m, N) 



where G{rh, N) denotes the normalizing constant of the 
equilibrium state probabilities of the Markov chain under- 
lying the queueing network [16], 1; indicates a vector com- 
posed by all zeros except for a one in the /th position, and 
rh = (mi , 7712 5 • ■ ■ , 'm-Ai) is the multiplicity vector such that 
the multiplicity 771^ is the number of queues in the model 
with identical service demands D^.i, £'fc,2, • • ■ , D^.r- Ac- 
cording to these definitions, e.g., G(77i + Ifc, — 1^) repre- 
sents the normalizing constant of a model augmented with 
an additional copy of queue k and with a job of class r re- 
moved. Because of the presence of replicated stations, the 
total number of queues in the model is Mfot — Sfcli 
among which only M have distinct demands. 

The advantage of working with normalizing constants 
instead of mean values is that G(777, N) enables the compu- 
tation of probabilistic measures that provide fine-grain in- 
formation about the equilibrium state of the network. For 
instance, for the case 777 = (1,1,...,!) where all queues 
are distinct, the equilibrium state probabilities can be com- 
puted as 



Pr(77l,772, . 



G{Tn, N) 



(3) 



where 77^ = {nk,i,nk^2, ■ • ■ , being 77^^^ the number 

of class-r jobs in queue k in the considered state, C{nk) = 
(Dr "■fc,r!)/'^fe!' and77fc = 77fe^i+77fc^2+- • ■+nkM- Note that 
quantities like Q cannot be computed neither by the MVA 
algorithm nor by local iterative approximations [1,9, 14,24], 
thus the normalization constant approach considered in this 
paper is inherently more general that these methods. 

2.1. Computational Solution 

The analysis of queueing networks can be performed ef- 
ficiently either by approaches that directly evaluate mean 
queue-lengths and throughputs in a recursive fashion, such 
as the Mean Value Analysis (MVA) algorithm [23], or by 
computational methods for the normalizing constants in 
dill, see [5]. The normalizing constant approach is usually 
slightly more efficient, although it can suffer numerical is- 
sues that do not apply to the mean value approach [19]. 
From a probabilistic perspective, the MVA algorithm and 
some methods for the normalizing constant, such as the 
LBANC algorithm [10], can be interpreted as a recursive 



evaluation of mean queue-length^ over models with differ- 
ent population sizes. Yet, we have recently noted in [7, 8] 
that recursively evaluating a set of higher-order moments of 
queue-lengths can be much more efficient computationally 
than computing mean values, while still returning the exact 
solution of the model. The Method of Moments (MoM) [7] 
is an algorithm that implements this higher-order moment 
approach and that we generalize for increased efficiency 
in the next sections; thus we give here a brief overview of 
the method. Due to limited space and thanks to wide avail- 
ability of material on the subject, we point to the literature 
for MVA [23], LBANC [10], RECAL [13], and Convolu- 
tion [6, 22]; comparative analyses can be found in [4, 7]. 

2.1.1. Method of Moments (MoM) MoM computes the 
normalizing constant by simultaneously considering into 
a linear system of equations the following exact formu- 
las for normalizing constants: the convolution expression 
(CE) [10,20] 

R 

G{Th + lk,N)^ G{m, N) + Y, Dk,rG{m +lk,N~ t) 

T = \ 

(4) 

for all 1 < fc < M, and the population constraint (PC) [7, 
13] 

NrGim, N) = ZrG{m, N - U) 

AI 

+ J2mkDk,rG{7n + lk,N -Ir), (5) 

fe=l 

for all 1 < r < B, which are also the fundamental recur- 
rence relations employed in the LBANC and RECAL algo- 
rithms. These recursions are subject to the following termi- 
nation conditions: (i) G(to, N) = if any entry in N or 
m is negative; (ii) 0(0, 0) = 1, where = (0, 0, ... , 0). 
In classic algorithms, G{fh,N) is obtained by recursively 
evaluating one between (|4|i and Q until termination con- 
ditions are met. Following this approach, time and space 
requirements grow roughly as 0{N-^) if (|4]i is used (e.g., 
LBANC) and as O(iV*0 if © is used (e.g., RECAL). In 
practice, these costs are often prohibitive since in modeling 
modern systems it is not difficult to have N of the order of 
hundreds or thousands and min{A/, R} > 5 — 6 (see [18] 
for a recent case study), which make the storage require- 
ment of hundreds of gigabytes regardless of the recursion 
used. 

MoM avoids this memory inefficiency by observing that, 
if one considers a certain subset of normalizing constants 
V{N), which we call basis, then this basis can be computed 
recursively by jointly using (|4]i and ^ to define the matrix 



1 For normalizing constant methods such as LBANC, the computation 
focuses on im-normalized mean queue-lengths [20] . 



difference equation 

A{N)V{N) = 'B{N)V{N - Ir), (6) 

where V^(0) is known from the termination conditions of 
(|4]l-(|5]l, and the matrices A{N) and B(iV) are square of 
identical size. The matrices A{N) and B(7V) are defined 
by the coefficients of the equations (|4]i-(|5]l that relate all 
and only the normalizing constants in V{N) with those in 
V{N - Ir). The basis is: 

V{N) = {G{m', N),G{m', N-U), G{m' , N-1r^i) 
\m' = m+ [Si, . . . , 6m), R-l< Ek=A < R}, 

which is the set of normalizing constants of models where 
we have increased the elements of the vector to by i? or 
R — 1 units in all possible ways and where the models are 
evaluated over the populations N, N — li, . . ., N — 1r-i. 
The multiplicity increase operation is equivalent to add new 
queues to the model and, probabilistically, this can be in- 
terpreted as computing binomial moments of queue-lengths 
in the original queueing network [7, 8, 13]; hence one con- 
cludes that a recursive computation of V{N) is also a recur- 
sive evaluation of higher-order moments of queue-length. 
Indeed, the knowledge of V{N) is sufficient to compute all 
the normalizing constants used in ([T]i, see [7]; thus, comput- 
ing V{N) is equivalent to solve the model. 

The interest for ^ is that the matrix recursion is linear 
and does not branch exponentially like (|4|i-(|5]l, since we can 
progressively remove the elements of N without increas- 
ing the size of the V{-) vectors and until the termination 
condition V{0) is reached. If the linear system ^ is non- 
singular, one can compute V{N) = A^^(7V)B(7V) by an 
exact solution technique, like exact Gaussian elimination or 
the Wiedemann algorithrr0 which prevent the critical effects 
of round-off error accumulation when the recursion is eval- 
uated hundreds or thousands of times and also avoid numer- 
ical issues arising in normaUzing constant computations [8]. 
If the Wiedemann algorithm is used, the computational cost 
of linear system solution grows quadratically with the ba- 
sis size and as 0{N^ log A^) with respect to the total popu- 
lation, which is typically much less than the 0{N^) and 
0{N^'^) of classic methods. An example illustrating the 
MoM algorithm is given below, together with intuition on 
the MoM generalization proposed in this work. 



2 See, e.g., the LinBox open source library 
jhttp : / / www . linalg ■ orgi for a free implementation of 
the Wiedemann algorithm, exact Gaussian ehmination, and other ex- 
act methods that can be used to solve the MoM matrix difference 
equation. 



R added queues I + 2'1, 
R-1 added queues +1 , f +i i 

Figure 1 . Basis of normalizing constants V{N) for a 
modei with! M = 2 queues and R = 2 classes. Each circle 
represents a group of R normalizing constants of mod- 
els with populations N and TV — 1 1 . Labels indicate the in- 
crease of the multiplicity vector m relatively to that sub- 
set of normalizing constants. 



3. Motivating Example 

We begin by illustrating the structure of (|6l) on a sim- 
ple queueing network with M = 2 queues, R = 2 classes, 
a population N — {Ni,N2), and where m = (1, 1), i.e., 
all queues are distinct. To compact notation, let use denote 
dz,k,s = {mk + z) ■ Dk^s and G+^''' = G{m + !„ + U,N- 
Ic — Id) - Then, the linear system ^ has the following struc- 
ture: 



"1 --Di.i ■ 
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-1 




1 -D2,i ■ 
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do, 1,2 ■ C^l,2,2 

do, 1,2 ■ di 



Z2 
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B(JV) 



where • indicates a zero element, and the four blocks of 
the coefficient matrices represent from top: the CE (|4]i for 
= 1, the CE for fc = 2, the PC dSj for r = 1, and the 
PC for r = 2. The basis of normalizing constants is de- 
picted in Figure [T] We remark that, for each element in the 
figure, the basis includes both normalizing constants for the 
populations iV and TV — li, hence the total number of ele- 
ments is cardiyiN)) = 10. 

The fundamental observations presented in this paper to im- 
prove MoM and, specifically, to considerably reduce the 
cost of computing ViN), are as follows: 



1 . we first note that it is possible to add independent equa- 
tions to the above linear system by taking in considera- 
tion a generalization of the convolution expression dU 
explained later in the paper; this generalization pro- 
vides independent information and makes the linear 
system over-determined. 

2. we show that, if the linear system is over-determined, 
then the basis V(N^ can be defined smaller, while still 
preserving the capability of MoM of solving exactly 
queueing networks. The basis size reduction leads to 
remarkable computational savings compared to the 
original MoM approach. 

3. however, as we explain in Section |4] for models of 
arbitrary size the additional independent information 
comes at the price of additional recursions over mod- 
els with different number of queues. We investigate in 
the rest of the paper if accepting these additional recur- 
sions is convenient with respect to the computational 
savings implied by the basis size reduction. 

The previous observations are further illustrated in the next 
subsection. 

3.1. Improved Computation of the Basis of Nor- 
malizing Constants 

We begin by observing that (|4]i can be seen as a special- 
ization of the recursive equation used by the Convolution 
Algorithm [6,22], which we call the generalized convolu- 
tion expression (GCE) 

R 

G{m,N)=G{m-lk,N)+Y,Dk,rG{m,N-lr), (7) 

r=l 

for all 1 < fc < M. Here queues are removed through the 
parameter m — Ife, instead of being added as in (|4]i. This im- 
plies that a recursion involving ^ may also evaluate mod- 
els which contain less queues than in the original queue- 
ing network, while ^ operates on networks with multiplic- 
ity to' > to only. However, by instantiating d?) on a model 
with multiplicity to + 1^ instead of m, it is found that (|7]l 
becomes identical to (|4|i, thus specifies a subset of (|7|l- 
Whenever (|7]l is instantiated on models with less queues 
than in the original network, the information provided by 
dTjl is independent with respect to the one provided by Q, 
because the two equations are defined over models with dif- 
ferent network structure. For example, equation ^ may be 
added to the simple queueing network considered before if 
instantiated as 

G(to + 2 • li,iV) = G(to + 2- li - l2,N) 

R 

+ ^D2,rG{rn + 2-li,N -Ir). (8) 



In this case, the normalizing constant G(rn + 2 ■ li — 12, N) 
hes outside the basis V{N), thus equation ([8]l does not re- 
duce to a CE and provides independent information. Note 
also that G(m + 2 • 1 1 — l2,N) is the normalizing constant 
of a model where queue 2 has been completely removed 
since we have assumed m = (1, 1), thus it can be computed 
easily with closed-form formulas for the balanced network 
case [21] and therefore the addition of (|8j does not increase 
the number of unknowns in the linear system. 

The main idea investigated in this paper is that this in- 
dependent information can be exploited effectively to re- 
duce the size of the basis V{N). In fact, consider a new 
basis Vnew{N) composed by normalizing constants with 
R — 2 < J2k "^fc < i? — 1 additional queues instead of 
the R — 1 < J2k ™fc — as in the original definition of 
V{N). Then, using (HJl, (|5]l, and dHJ, we can define a lin- 
ear system with square matrix of coefficients 
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where the new vector Kietu(^) includes the normalizing 
constant G+^^+i^-^ = q(^^ + 2 • h - I2, TV) used in dH), 
and the blocks of the coefficient matrix are from the top: the 
CE for fc = 1, the CE for fc = 2, the GCE (O, the PC for 
r ~ \, and the PC for r = 2. The new linear system may be 
written compactly as 

A„e^(7V)F„™(iV) = V-Jl{N)+B^,^{N)Vnen,{N-lR) 

(9) 

with square coefficient matrix, thus if A^g\^ [N] exists the 
solution of the linear system (l9| provides an alternative 
way to recursively compute normalizing constants that is 
cheaper than the original linear system (|6|, since ^ halves 
the order of the coefficient matrix with respect to We 
stress that without ^ the new system (|9]l would be under- 
determined, thus resorting to the GCE equations is critical 
for this new approach. 

It is also important to remark that, for queueing networks 
larger than the one considered in this experiment, the nor- 
malizing constants in (N) may not be available from 
closed-form expressions. In this case, the computation of 
V~J^{N) requires additional recursions over models with 
different number of queues; we show in the next section 
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multi-branched recursion 

f ni- \ 
I (1,1,1) j 



1 queue 
(a) Single GCE 




(1,0,0) ) I (0,1,0) ) ( (0,0,1) 



(b) Multiple GCEs 



Figu re 2. structure of the MoM recursion after addition 
of a single or multiple GCEs Q on a model with M = 3 
queues. The label k = 1, e.g., indicates a GCE instan- 
tiated for k = 1 on all models with I added queues in 
the redefined basis. Labels within a circle indicate the 
multiplicity vector m on which the basis is defined, e.g., 
(1, 0, 1) is the model obtained from the original queue- 
ing network by removing queue 2. 



that, if multiple equations Q are used simultaneously, this 
yields a multi-branched recursive structure for the MoM al- 
gorithm, where one needs to evaluate recursively also mod- 
els with less queues that are not considered in the original 
MoM recursion. 

4. The Multi-Branched Method of Moments 

The integration of the GCE ([7]) into the MoM linear sys- 
tem can be done in different ways depending on the number 
of equations (|7| simultaneously instantiated into the matrix 
difference equation. As observed earlier, integrating GCEs 
into MoM allows to reduce the basis size; this reduction can 
be specified by a decrease in the number of queues added to 
the multiplicity vectors in the basis, which is equivalent to 
considering queue-length moments of smaller order Specif- 
ically, if the new basis Vnew (N) includes models with only 
I — 1 and I added queues, one can integrate a single or mul- 
tiple GCEs for each model with I additional queues on/}^. 
A comparison of the recursion trees arising from the two al- 
ternatives (single or multiple GCEs) is given in Figure |2] 

Using a single GCE implies an additional M recursions 
which first remove from the model queue M, followed by 
queue AI — 1, and so forth up to a trivial model with a single 
queue. Instead, using all possible GCEs implies that the ad- 



ditional recursions first consider all possible 



M 
M-1, 
M 



models 



with A/ — 1 queues, followed by all possible (j^^.j) models 



The GCE is not needed for models with I — 1 additional queues, since 
their normalizing constants are all immediately computed from the ba- 
sis for V(N ~ Ijj) using the PC of class R. 



with M — 2 queues, and so on up to models with a single 
queue. The latter approach appears to be the most expen- 
sive, at least if one ignores the basis size reduction, because 
it has a number of new recursions that grows combinatori- 
ally instead of linearly. Yet, while limiting to a single GCE 
seems a natural choice to control the number of new recur- 
sions, we have noted that in practice the additional informa- 
tion of the multiple GCEs implies a much larger reduction 
of the basis than in the case of a single GCE. This in turn 
provides computational savings often greater than the addi- 
tional overheads imposed by the extra recursions. Thus, in 
this section we investigate the trade-off imposed by differ- 
ent types of integrations of GCEs and consider the general 
case of simultaneously considering up to B, 1 < S < A/, 
GCEs in the linear system. We also provide a complexity 
analysis to evaluate the best choice of this branching fac- 
tor i? as a function of the other model parameters. 

4.1. Basis Reduction 

We now investigate the reduction of the basis cardinal- 
ity as a function of the number of GCE equations added to 
the MoM matrix difference equation. Indeed, the most in- 
teresting cases are 1 ) when (|7]i is added for a single value of 
k or 2) when aU possible equations in (|7]i are added; in fact, 
intermediate cases imply a combinatorial branching of the 
recursion and thus grow in computational complexity sim- 
ilarly to the second case. Let us define a basis of level I, 
Z > 1, as the set 

Vi{N) = {G{m\ N),G{m', N-U), G{m', N-1r-i) 
\m' = m+ ((5i, . . . , Sm), l-l< ElLih < I}, (10) 

which is the set of normalizing constants with I — 1 or I ad- 
ditional replicated queues. According to this definition, in 
MoM it is always V{N) = Vr{N), while the basis in the 
example of the last section after the addition of the GCE 
is Vnew{N) = Vr^i{N). A basis of level I has cardinal- 
ity card(yi{N)) — thus a decrease, thanks to 
the GCEs, of I even by a few units implies a quick combina- 
torial reduction of the number of elements in the basis. The 
next theorems are the fundamental result of this paper and 
exactly quantify the amount of this reduction. 

Theorem 1. The inclusion in the MoM matrix difference 
equation ^ of the GCEs ^ for k — 1, . . . , M on all mod- 
els having I additional queues in the basis Vi (N) allows to 
define a linear system of the type 

Ai{N)Vi{N) = Vi-Hn) + Bj(7V)l^(7V - \r), (11) 

which has more equations than unknowns if 
I > max{l,i? — M}. Therefore, the basis has mini- 
mum cardinality for I — max{l, R — M^. 



Proof A basis of level I has i^'^^^'^^R normaliz- 
ing constants, while the total number of CEs and PCs 
is (^^+\"^)(A/ + i? - 1) since there exist M CEs and 

R — \ PCs for each of the i^^^l^"^^ normalizing con- 
stant with / — 1 additional queues and all other possi- 
ble CEs and PCs require constants outside the basis. Thus 
we have that, in absence of GCEs, there are more equa- 
tions than unknowns in the matrix difference equation if 
+ i? - 1) > i?, which is true for 

all I > R. h\ particular, / = R gives the minimum car- 
dinality of the basis and for this reason it is the choice 
done by MoM for its basis V{N) = Vr{N) which ig- 
nores the GCEs. We now add to the previous condi- 
tion the number of additional GCEs which do not specialize 
into CEs and that we can formulate for models with I addi- 
tional queues, which is Y:t=i^''^ Of) (/-D -h). This 
can be explained as follows. Consider a model with normal- 
izing constant in Vi [N) and where we have added / queues. 
Denote by h the number of distinct queues among the I 
queues we have added. It is possible to see that removing 
any of these h queues using a GCE involves only normaliz- 
ing constants with I — 1 added queues that are already in the 
bases Vi (N) and Vi {N—1r}, thus these specific GCE equa- 
tions are identical to the CEs and do not provide indepen- 
dent information. Therefore, for a model with h distinct ad- 
ditional queues, only M — h GCEs are different from 
the existing CEs. Note that there are (^^) ways of choos- 
ing the h distinct queues and (''^^'Z^'^"^) = (/Z/J ways 
of adding I queues to the model chosen among these h dis- 
tinct ones under the constraint that each of the h queues is 
chosen at least once. Combining these expressions gives 
the number of GCEs that are not CEs, which simpli- 

fiestoErr^'^ (-)G'ZD(M-^) 
= {^+!;^){M + R~1) = (^^+/-2)M, where the first pas- 
sage follows by Vandermonde convolution [12]. 

Adding the number of GCEs that are not CEs, we evalu- 
ate the following condition for (fTTI) to have more equations 
than unknowns A/ + (^^,t'r^) {M + R - 1) 

> Suppose first R > A/ + 1 and thus / = 

max{l,i? — A/} = i? — M, then we consider the con- 
dition {^-!,)M + U^^'_,){M + R - 1) > {tD^ 
which using the property of binomial coefficients (^) = 
("it ^) + (fc-i) °" *^ right hand side gives {^Im)M + 
U^lt,){M + R-l) > (^«Z-^)ii+(^f-iJi? that sim- 
plifies to {j,'^^'_,){M - 1) > - M) which is 
actually an equality because, after expanding the binomial 
coefficients, both sides are found identical. Hence, since 
I = R — M always returns an equality between number of 
equations and number of unknowns, it is easy to verify that 
I < R — M would always give an under-determined sys- 
tem and thus I = R — M gives the minimum allowable ba- 



sis size for the case R > M + 1. 

Consider now the other case R < M+1 where / takes the 
minimum possible value ^ = max{l,i?— Af} = 1, we have 
then (Y) (M - 1) + (Af + f? - 1) > (*^) R which is 

equivalentto Af(Af-l) + (Af + f?-l) > Aff? and assum- 
ing the worst case f? = Af" + 1 we get Af (Af - 1) + 2Af > 
M{M + 1) which is always true because the two sides sim- 
plify to the same identical value. This means that the lin- 
ear system is always square if we use the minimum value 
I = 1 when f? < Af + 1. 

We can summarize the above findings saying that I = 
max{l,f? — Af} always implies a Ai{N) matrix that is 
square or over-determined and that smaller values of I in- 
stead result in under-determined systems for certain values 
of Af and R. This concludes the proof of the theorem. □ 

Theorem 2. The inclusion of a single GCE ^ for given k 
in the MoM matrix difference equation ^ allows to define 
a linear system similar to ( If ft , but which has more equa- 
tions than variables if I > max{l,f? — 1}. In particular, 
the basis has minimum cardinality for I — max{l, f? — 1}. 

Proof. The proof differs from that of Theorem [T] for the 
number of GCE equations that are not CEs. Suppose that 
GCEs for given k are used, and assume without loss of gen- 
erality that the GCE of station k = Af is the one included 
in the matrix difference equation. Then the number of GCEs 
that are not CEs is 



min{A/.i} 

E 

h=l 



I -I 
I - h 



min{A/-l.i-l} 

E 

h=l 



i-i 
I - h 



where the left term follows similarly to the number of GCEs 
in Theorem [T] but for the case where one GCE is added, 
instead of Af, to the models in Vi{N) with / additional 
queues. The right term counts instead the number of times 
this GCE is identical to an existing CE. The above ex- 
pression becomes simpler thanks to Vandermonde convo- 
lution [12] and gives that we have more equations than un- 
knowns if {^'+1-^) + (Af + f? - 1) > (*'+/"')f?. 

Now using (^) = ("^^) + (^3 J) on the right hand side we 
get + (*^+'-2)(Af + f? - 1) > (^"+/-')f? + 

(^'^+'~2)f?, which is equivalent to (*^+'j"^)(Af - 1) > 

(^^^/^) ~ 1) ■ expanding the binomial coefficients it 
is found that the two sides are identical if / = max{l, f?— 1} 
which completes the proof. □ 

The results in Theorem[T]and Theorem |2] show that: (1) 
if all GCEs are added to the MoM matrix difference equa- 
tion, then the basis level can be decreased by up to Af units; 
(2) if a single GCE is used, the basis level can instead be 
decreased by a single unit. Following the same line of the 
proofs of Theorem [T] and Theorem |2] it is then straightfor- 
ward to show the following coroUary. 



Corollary 1. IfB GCEs, 1 < B < M, are added to the ma- 
trix difference equation, then the basis can be decreased by 
up to B levels and the minimal basis size is obtained with 
the basis level I — min{l, R — B}. 

The next section investigates the computational implica- 
tions of the last result. 



5. Computational Complexity 

Corollary [Uenables the evaluation of the optimal choice 
of the branching factor B as a function of the model size. 
In practice, we are interested to understand when the addi- 
tional recursions implied by a branching factor B give an 
overhead that is less that the savings implied by the reduc- 
tion of the basis level from I = R of the original MoM to 
I = min{l, f? - 5} of MoM with GCEs. 

We first observe that if B GCEs are used in the MoM lin- 
ear system, then the basis Vf'^{N) in ( fTTT i is computed re- 
cursively from B bases of models with a queue less. These 
models have Af — 1 queues, thus the branching factor in this 
case is upper bounded by f? < Af — 1. That is, the maxi- 
mum number of GCEs added to the linear system changes 
according to the distance d, d = 0, . . . , Af — 1, in the re- 
cursion tree from the root (i.e., the original model). For 
d = Q, . . . , B — 1, only up to d GCEs can be added to the 
linear system, while for distances d — B , . . . , M we can al- 
ways add B GCEs. This can be seen immediately from Fig- 
ure |2] where the number of GCE equations instantiated for 
a model with three queues are three (fc = 1, fc = 2, and 
k = 3), two for a model with two queues, and they decrease 
progressively during the recursion. 

Starting from the previous consideration, we analyze be- 
low the computational complexity of MoM with GCEs for 
the two limit cases B = 1 and B = Af , and provide discus- 
sion about the intermediate cases 1 < f? < Af at the end of 
this subsection. 

r/we Requirements. If B = 1, then V^~'^(7V) is com- 
puted by Af recursions. During the c?th recursive step d = 
0, . . . , Af — 1, the model has M — d queues; the basis is al- 
ways of level I = R — 1 for all steps. Assuming quadratic 
costs in the solution of the linear system, e.g., using a 
method like the Wiedemann algorithm, we have that the 
time for computing V{N) from V{N — 1r) grows as 



d=0 



R-1 



(12) 



where the term between parenthesis is the coefficient matrix 
order in (fTTI) and 

St,,, « {N log(Af -d + N))(^^'^^f' ^'l R 



MOM+GCE - B=1 ^ MOM+GCE - B=M ^ Original MOM 




Figure 3. Time requirements of IVIolVI and thie divide-and-conquer lUlolUI for different number of queues M and number of ser- 
vice classes R. All queues are assumed distinct. The results indicate that assuming a branching level B = M \s far supe- 
rior to B = I, unless a small number of queues is considered in the model (M < 4). The total population in the network is set 

to Af = 100. 



is the overhead of exact algebra for a model with M — d 
queues and assuming that the linear system solver uses mul- 
tiprecision arithmetic [7]. In the expression ( fT2] l we have 
ignored the exact number of iterations of the solution algo- 
rithm and thus the expression may be regarded as a cost per 
iteration of the linear system solver. 

In the case where we use all possible GCEs, it is _B = M 
at the first recursive step, then B = M — 1 at the sec- 
ond step, and B = Af — d at the dth recursive stejfl In 
addition, the level used at the dth step of the recursion is 
I = l{d) — max{l, R — M + d}, which is thus a function 
of the distance d from the root of the recursion tree. Follow- 
ing these observations, the time requirements grow as 



M 



M-l 

d=Q 



M -d+l{d) - 1 
l{d) 



R 



exact 



where l{d) = max{l, R — M + d} and the term (j^^^) ac- 
counts for the combinatorial branching of the recursion and 
is the number of all possible queueing network models with 
M — d distinct queues chosen among the initial M. For ex- 
ample, when M < R 



M-l 

E 

d=0 



M 

M -d 



R-l 
R-M + 



d j I '^exact 



which is significantly smaller than (fTSl) since the binomial 
coefficient does not longer depend on the sum of M and R. 



4 This observation holds trae under the assumption that the model is 
composed initially by queues that have all multiplicity = 1, i.e., 
which are all distinct. The case of models with replicated queues has 
more favorable computational costs if the total number of queues (in- 
cluding the non-repHcated ones) is the same, thus our analysis is a 
worst-case scenaiio when M is interpreted as the total number of 
queues instead of the number of distinct ones. 



Similarly to the case B = \, the time requirements ex- 
pression is a cost per solver iteration and the term raised to 
square is the linear system order. Compared to the above ex- 
pressions, the original MoM algorithm has a time require- 
ment per iteration of 



M - 



-R-l 
R 



R\ S° 



(13) 



Figure [3] quantifies the savings per solver iteration of the 
new algorithm for B = 1 and B = M compared to the 
costs of the original MoM. Since the costs are dependent on 
A/, R, and the population size N, we simplify the evalua- 
tion and consider the variation of M and R under a quite 
large N = 100. The cost surfaces indicate that the algo- 
rithm with B = M is typically the most efficient except for 
very low values of M where it is much more expensive than 
B ~ 1 and the original MoM, although the cost per iteration 
remains quite small. Overall, the savings of the B = 1 case 
are quite limited compared to the original MoM, while mas- 
sive cost reduction is achieved with the multi-branched case 
B — M. This is quite counter-intuitive, since one would at 
first expect that the wide recursion tree in Figure |2b) is 
a major source of computational cost compared to the lin- 
ear recursive structure in Figure|2ja). Yet, Figure[3]indicates 
that, for multiclass models that can be solved in acceptable 
times with commonly-available hardware, the cost of the 
combinatorial branching in Figure Wih) is not yet a perfor- 
mance bottleneck and it is justified by the massive compu- 
tational saving of the basis size reduction. 

Space Requirements. The space requirement of the case 
B = 1 is upper bounded by the cost of storing the linear 
system ( fTTT i in memory when it is largest, i.e., for the origi- 




Figure 4. Space requirements of MoM and the divide-and-conquer MoM for different number of queues M and number of 
service classes R. The interpretation of the results is qualitatively similar to the one for the time requirements in Figure[3l 
with the best branching level being B = M unless the number of queues M is small. The total population in the network is 

set to Af = 100. 



nal model with M queues. This is approximately given by 



M + R-2 
R-l 



R 



M + R-2 
R-l 



R I S",,^ 



(14) 

The evaluation of memory requirements in the case B = 
M is similar, but requires to take into account the width of 
the multi-branched recursion tree, since all basis vectors for 
models with fc — 1 queues should be available before the 
evaluation of models with k queues. Thus, the memory oc- 
cupation is 



M 

max , 

d=i,...,M \ M - d 




(15) 



where the first term is the cost of storing A{N) and 'B{N) 
for the currently evaluated linear system, while the second 
term accounts for the basis for populations N and N — Ir 
of all models at distance d from the root of the recursion. 

Finally, the computational costs of the original MoM are 
given by [7] 



M + R-1 
R 



R 



M + R- 2 
R-l 



(16) 

which is quite similar to the cost of the case B = 1. 

The comparison of the space requirements of the three 
different methods is shown in Figure H] for different val- 
ues of M and R\ we set again the total population to 
N — 100. Results are qualitatively similar to the time re- 
quirement case: the GCE equations provide the largest sav- 
ings in space requirements compared to the original MoM 



only if B — M. The case B = 1 is 1 — 2 orders of mag- 
nitude faster then the original MoM for models with few 
queues {M < 4), while as M increases the algorithm with 
B = M scales much better. In particular, for the most chal- 
lenging model with M = 11 and R = 11, the computa- 
tional saving of the modified algorithm with B = M is 
about four orders of magnitude over the original MoM, thus 
making the case that the inclusion of the GCE equations is 
highly-valuable also for the space requirements. 

Intermediate cases 1 < B < M. Following the result in 
Corollary[T]it is immediately found that the size of the basis 
for intermediate choices of the branching level B is always 
bounded by the choices B = 1 and B = M and computa- 
tional requirements are typically within those of these limit 
cases. For example, assume that B queues are chosen for re- 
moval and the multi-branched recursion is operated only on 
these queues such that the recursion is terminated by solv- 
ing with the original MoM models with M — B queues. In 
this case, we have found that the computational costs of the 
choices B — 1 and B = M are always better than these in- 
termediate cases, unless M — B = 1. Yet, in this more fa- 
vorable cases, the costs of the intermediate choice of B have 
the same order of magnitude of the best between B = 1 
and B = M, therefore the savings of these intermediate 
cases seem marginal and do not motivate a specialized im- 
plementation of the algorithm. As a result, we believe that 
the multi-branched recursion approach is best implemented 
with a choice B — M which provides the biggest savings 
with respect to the original MoM on the largest number of 
choices of M and R. 

Comparison with MVA Algorithm. As a final remark, re- 
gardless of the branching level B used, the computation of 
V{N) from ^ (0) has an 0{N^ log N) time complexity and 
an 0{N log N) space complexity as the total population N 



grows. Since MVA is 0{N^) in time and space complex- 
ities, it is immediately clear that for sufficiently large pop- 
ulations MoM is always faster and less memory consum- 
ing than MVA. Savings are obtained by MoM already for 
populations composed by few tens of jobs [7]. Therefore, 
since the original MoM is already much more scalable than 
MVA, it is an immediate consequence that the generalized 
MoM with GCEs, which always performs better than MoM, 
will be always several orders of magnitude more efficient 
than MVA or other methods such as RECAL or LBANC. 
We point to [7] for a comparison of the original MoM with 
these methods supporting the statements in this subsection. 

6. Conclusions 

In this paper, we have presented a generalization of the 
Method of Moments (MoM), a recently proposed algorithm 
for the exact analysis of multiclass queueing network mod- 
els which are widely used in capacity planning of computer 
systems and networks [7,8]. We have integrated in the MoM 
equations also the recursive formula used in the Convolu- 
tion Algorithm [6, 22], here called the generahzed convolu- 
tion equation (GCE). We have shown that using the GCE 
in MoM significantly changes the structure of its recursion 
leading to the evaluation of models with different number of 
queues and which can be solved much more efficiently than 
the larger models considered by MoM. As a result, the com- 
putational costs in time and space of the generalized algo- 
rithm are several orders of magnitude smaller than the orig- 
iniil MoM recursion. 

As a possible extension of this work, we believe that the 
Convolution Algorithm equation considered in this paper 
could benefit also the Class-Oriented Method of Moments 
(CoMoM) algorithm presented in [8]. This algorithm can 
be seen as the dual of the MoM algorithm, where the basis 
of normalizing constants considered in the recursion is de- 
fined in such a way that a different tradeoff between number 
of queues and classes is considered and this favors the solu- 
tion of models with many classes compared to the original 
MoM. The generaUzation of CoMoM with GCEs could pos- 
sibly further enhance its scalability on models with many 
classes. 
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